当信号通过物理传感器测量,它们被噪声干扰。为了减少噪音,低通滤波器,以便衰减高频分量的输入信号,如果无论它们来自噪声或实际信号被通常使用的。因此,低通滤波器必须仔细调整以避免信号的显著恶化。这种调整需要有关的信号,这往往不是在应用,如强化学习或基于学习控制提供先验知识。为了克服这种限制,我们提出了一种基于高斯过程回归自适应低通滤波器。通过考虑以前的意见,更新和预测足够快的现实世界的滤波应用的恒定窗口即可实现。此外,超参数导致的低通行为适配的在线优化,使得没有事先调整是必要的。我们表明,该方法的估计误差一致有界,并证明了该方法的灵活性和效率的几个模拟。
translated by 谷歌翻译
由于治疗益处和减轻劳动密集型工作的能力,在临床应用中使用康复机器人技术的重要性提高了。但是,他们的实际效用取决于适当的控制算法的部署,这些算法根据每个患者的需求来适应任务辅助的水平。通常,通过临床医生的手动调整来实现所需的个性化,这很麻烦且容易出错。在这项工作中,我们提出了一种新颖的在线学习控制体系结构,能够在运行时个性化控制力量。为此,我们通过以前看不见的预测和更新率来部署基于高斯流程的在线学习。最后,我们在一项实验用户研究中评估了我们的方法,在该研究中,学习控制器被证明可以提供个性化的控制,同时还获得了安全的相互作用力。
translated by 谷歌翻译
模型校准衡量预测的概率估计与真实性可能性之间的一致性。正确的模型校准对于高风险应用至关重要。不幸的是,现代深层神经网络的校准不佳,损害了可信度和可靠性。由于组织边界的自然不确定性,医疗图像分割尤其遭受了这种情况。这对他们的损失功能感到愤怒,这有利于多数级别的过度自信。我们用Domino(一种域感知的模型校准方法)解决了这些挑战,该方法利用了类标签之间的语义混淆性和分层相似性。我们的实验表明,在头部图像分割中,我们受多米诺骨牌校准的深神经网络优于非校准模型和最先进的形态学方法。我们的结果表明,与这些方法相比,我们的方法可以始终如一地实现更好的校准,更高的准确性和更快的推理时间,尤其是在稀有类别上。该性能归因于我们的域知觉正规化,以告知语义模型校准。这些发现表明,班级标签之间语义联系在建立深度学习模型的信心中的重要性。该框架有可能提高通用医学图像分割模型的可信度和可靠性。本文的代码可在以下网址获得:https://github.com/lab-smile/domino。
translated by 谷歌翻译
我们概述了新兴机会和挑战,以提高AI对科学发现的效用。AI为行业的独特目标与AI科学的目标创造了识别模式中的识别模式与来自数据的发现模式之间的紧张。如果我们解决了与域驱动的科学模型和数据驱动的AI学习机之间的“弥补差距”相关的根本挑战,那么我们预计这些AI模型可以改变假说发电,科学发现和科学过程本身。
translated by 谷歌翻译
医疗AI通过支持基于证据的医学实践,个性化患者治疗,降低成本以及改善提供者和患者体验,推进医疗保健的巨大潜力。我们认为解锁此潜力需要一种系统的方法来衡量在大规模异构数据上的医疗AI模型的性能。为了满足这种需求,我们正在建立Medperf,这是一个开放的框架,用于在医疗领域的基准测试机器学习。 Medperf将使联合评估能够将模型安全地分配给不同的评估设施,从而赋予医疗组织在高效和人类监督过程中评估和验证AI模型的性能,同时优先考虑隐私。我们描述了当前的挑战医疗保健和AI社区面临,需要开放平台,Medperf的设计理念,其目前的实施状态和我们的路线图。我们呼吁研究人员和组织加入我们创建Medperf开放基准平台。
translated by 谷歌翻译
We demonstrate a proof-of-concept of a large language model conducting corporate lobbying related activities. We use an autoregressive large language model (OpenAI's text-davinci-003) to determine if proposed U.S. Congressional bills are relevant to specific public companies and provide explanations and confidence levels. For the bills the model deems as relevant, the model drafts a letter to the sponsor of the bill in an attempt to persuade the congressperson to make changes to the proposed legislation. We use hundreds of ground-truth labels of the relevance of a bill to a company to benchmark the performance of the model, which outperforms the baseline of predicting the most common outcome of irrelevance. However, we test the ability to determine the relevance of a bill with the previous OpenAI GPT-3 model (text-davinci-002), which was state-of-the-art on many language tasks until text-davinci-003 was released on November 28, 2022. The performance of text-davinci-002 is worse than simply always predicting that a bill is irrelevant to a company. These results suggest that, as large language models continue to improve core natural language understanding capabilities, performance on corporate lobbying related tasks will continue to improve. We then discuss why this could be problematic for societal-AI alignment.
translated by 谷歌翻译
In the past years, deep learning has seen an increase of usage in the domain of histopathological applications. However, while these approaches have shown great potential, in high-risk environments deep learning models need to be able to judge their own uncertainty and be able to reject inputs when there is a significant chance of misclassification. In this work, we conduct a rigorous evaluation of the most commonly used uncertainty and robustness methods for the classification of Whole-Slide-Images under domain shift using the H\&E stained Camelyon17 breast cancer dataset. Although it is known that histopathological data can be subject to strong domain shift and label noise, to our knowledge this is the first work that compares the most common methods for uncertainty estimation under these aspects. In our experiments, we compare Stochastic Variational Inference, Monte-Carlo Dropout, Deep Ensembles, Test-Time Data Augmentation as well as combinations thereof. We observe that ensembles of methods generally lead to higher accuracies and better calibration and that Test-Time Data Augmentation can be a promising alternative when choosing an appropriate set of augmentations. Across methods, a rejection of the most uncertain tiles leads to a significant increase in classification accuracy on both in-distribution as well as out-of-distribution data. Furthermore, we conduct experiments comparing these methods under varying conditions of label noise. We observe that the border regions of the Camelyon17 dataset are subject to label noise and evaluate the robustness of the included methods against different noise levels. Lastly, we publish our code framework to facilitate further research on uncertainty estimation on histopathological data.
translated by 谷歌翻译
In large-scale machine learning, recent works have studied the effects of compressing gradients in stochastic optimization in order to alleviate the communication bottleneck. These works have collectively revealed that stochastic gradient descent (SGD) is robust to structured perturbations such as quantization, sparsification, and delays. Perhaps surprisingly, despite the surge of interest in large-scale, multi-agent reinforcement learning, almost nothing is known about the analogous question: Are common reinforcement learning (RL) algorithms also robust to similar perturbations? In this paper, we investigate this question by studying a variant of the classical temporal difference (TD) learning algorithm with a perturbed update direction, where a general compression operator is used to model the perturbation. Our main technical contribution is to show that compressed TD algorithms, coupled with an error-feedback mechanism used widely in optimization, exhibit the same non-asymptotic theoretical guarantees as their SGD counterparts. We then extend our results significantly to nonlinear stochastic approximation algorithms and multi-agent settings. In particular, we prove that for multi-agent TD learning, one can achieve linear convergence speedups in the number of agents while communicating just $\tilde{O}(1)$ bits per agent at each time step. Our work is the first to provide finite-time results in RL that account for general compression operators and error-feedback in tandem with linear function approximation and Markovian sampling. Our analysis hinges on studying the drift of a novel Lyapunov function that captures the dynamics of a memory variable introduced by error feedback.
translated by 谷歌翻译
While the capabilities of autonomous systems have been steadily improving in recent years, these systems still struggle to rapidly explore previously unknown environments without the aid of GPS-assisted navigation. The DARPA Subterranean (SubT) Challenge aimed to fast track the development of autonomous exploration systems by evaluating their performance in real-world underground search-and-rescue scenarios. Subterranean environments present a plethora of challenges for robotic systems, such as limited communications, complex topology, visually-degraded sensing, and harsh terrain. The presented solution enables long-term autonomy with minimal human supervision by combining a powerful and independent single-agent autonomy stack, with higher level mission management operating over a flexible mesh network. The autonomy suite deployed on quadruped and wheeled robots was fully independent, freeing the human supervision to loosely supervise the mission and make high-impact strategic decisions. We also discuss lessons learned from fielding our system at the SubT Final Event, relating to vehicle versatility, system adaptability, and re-configurable communications.
translated by 谷歌翻译
Research on automated essay scoring has become increasing important because it serves as a method for evaluating students' written-responses at scale. Scalable methods for scoring written responses are needed as students migrate to online learning environments resulting in the need to evaluate large numbers of written-response assessments. The purpose of this study is to describe and evaluate three active learning methods than can be used to minimize the number of essays that must be scored by human raters while still providing the data needed to train a modern automated essay scoring system. The three active learning methods are the uncertainty-based, the topological-based, and the hybrid method. These three methods were used to select essays included as part of the Automated Student Assessment Prize competition that were then classified using a scoring model that was training with the bidirectional encoder representations from transformer language model. All three active learning methods produced strong results, with the topological-based method producing the most efficient classification. Growth rate accuracy was also evaluated. The active learning methods produced different levels of efficiency under different sample size allocations but, overall, all three methods were highly efficient and produced classifications that were similar to one another.
translated by 谷歌翻译